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Abstract 

This paper describes tactical generation in 
Turkish, a free constituent order language, in 
which the order of the constituents may change 
according to the information structure of the 
sentences to be generated. In the absence 
of any information regarding the information 
structure of a sentence (i.e., topic, focus, back- 
ground, etc.), the constituents of the sentence 
obey a default order, but the order is almost 
freely changeable, depending on the constraints 
of the text flow or discourse. We have used 
a recursively structured finite state machine 
for handling the changes in constituent or- 
der, implemented as a right-linear grammar 
backbone. Our implementation environment 
is the GenKit system, developed at Carnegie 
Mellon University-Center for Machine Transla- 
tion. Morphological realization has been imple- 
mented using an external morphological analy- 
sis/generation component which performs con- 
crete morpheme selection and handles mor- 
phographemic processes. 

Introduction 

Natural Language Generation is the operation 
of producing natural language sentences us- 
ing specified communicative goals. This pro- 
cess consists of three main kinds of activities 
(McDonald, 1987): 

• the goals the utterance is to obtain must be 
determined, 

• the way the goals may be obtained must be 
planned, 

• the plans should be realized as text. 

Tactical generation is the realization, as lin- 
ear text, of the contents specified usually using 



some kind of a feature structure that is gener- 
ated by a higher level process such as text plan- 
ning, or transfer in machine translation appli- 
cations. In this process, a generation grammar 
and a generation lexicon are used. 

As a component of a large scale project 
on natural language processing for Turkish, we 
have undertaken the development of a gener- 
ator for Turkish sentences. In order to im- 
plement the variations in the constituent or- 
der dictated by various information structure 
constraints, we have used a recursively struc- 
tured finite state machine instead of enumerat- 
ing grammar rules for all possible word orders. 
A second reason for this approach is that many 
constituents, especially the arguments of verbs 
are typically optional and dealing with such 
optionality within rules proved to be rather 
problematic. Our implementation is based on 
the GenKit environment developed at Carnegie 
Mellon University-Center for Machine Trans- 
lation. GenKit provides writing a context-free 
backbone grammar along with feature struc- 
ture constraints on the non-terminals. 

The paper is organized as follows: The 
next section presents relevant aspects of con- 
stituent order in Turkish sentences and fac- 
tors that determine it. We then present an 
overview of the feature structures for represent- 
ing the contents and the information structure 
of these sentences, along with the recursive fi- 
nite state machine that generates the proper 
order required by the grammatical and infor- 
mation structure constraints. Later, we give 
the highlights of the generation grammar ar- 
chitecture along with some example rules and 
sample outputs. We then present a discussion 
comparing our approach with similar work, on 
Turkish generation and conclude with some fi- 
nal comments. 



Turkish 

In terms of word order, Turkish can be char- 
acterized as a subjeci-objeci-verb (SOV) lan- 
guage in which constituents at certain phrase 
levels can change order rather freely, depend- 
ing on the constraints of text flow or discourse. 
The morphology of Turkish enables morpho- 
logical markings on the constituents to sig- 
nal their grammatical roles without relying on 
their order. This, however, does not mean that 
word order is immaterial. Sentences with dif- 
ferent word orders reflect different pragmatic 
conditions, in that, topic, focus and back- 
ground information conveyed by such sentences 
differ.^ Information conveyed through intona- 
tion, stress and/or clefting in fixed word order 
languages such as English, is expressed in Turk- 
ish by changing the order of the constituents. 
Obviously, there are certain constraints on con- 
stituent order, especially, inside noun and post- 
positional phrases. There are also certain con- 
straints at sentence level when explicit case 
marking is not used (e.g., with indefinite direct 
objects). 

In Turkish, the information which links the 
sentence to the previous context, the topic, is 
in the first position. The information which is 
new or emphasized, the focus, is in the imme- 
diately preverbal position, and the extra infor- 
mation which may be given to help the hearer 
understand the sentence, the background, is 
in the post verbal position (Erguvanli, 1979). 
The topic, focus and background information, 
when available, alter the order of constituents 
of Turkish sentences. In the absence of any 
such control information, the constituents of 
Turkish sentences have the default order: 

subject, expression of time, expression of 
place, direct object, beneficiary, source, 
goal, location, instrument, value designa- 
tor, path, duration, expression of manner, 
verb. 

All of these constituents except the verb are 
optional unless the verb obligatorily subcate- 
gorizes for a specific lexical item as an object 
in order to convey a certain (usually idiomatic) 
sense. The definiteness of the direct object 
adds a minor twist to the default order. If the 
direct object is an indefinite noun phrase, it has 
to be immediately preverbal. This is due to the 
fact that, both the subject and the indefinite 

"'See Erguvanli (1979) for a discussion of the 
function of word order in Turkish grammar. 



direct object have no surface case-marking that 
distinguishes them, so word order constraints 
come into play to force this distinction. 

In order to present the fiavor of word order 
variations in Turkish, we provide the following 
examples. These two sentences are used to de- 
scribe the same event (i.e., have the same log- 
ical form), but they are used in different dis- 
course situations. The first sentence presents 
constituents in a neutral default order, while 
in the second sentence 'bugiin' (today) is the 
topic and 'Ahmet' is the focus:^ 

(1) 
a. 

Ahmet bugiin evden okula 
Ahmet today home-|-ABL school-|-DAT 
'Ahmet went from home to school 

otobiisle 3 dakikada gitti. 
bus-FWITH 3 minute-FLOC go-FPAST-F3SG 
by bus in 3 minutes today.' 

b. 

Bugiin evden okula otobiisle 
today home-FABL school-^DAT bus-^WITH 
'It was Ahmet who went from home to 

3 dakikada Ahmet gitti. 

3 minute-FLOC Ahmet go-FPAST-F3SG 

school in 3 minutes by bus today.' 

Although, sentences (b) and (c), in the follow- 
ing example, are both grammatical, (c) is not 
acceptable as a response to the question (a): 

(2) 
a. 

Ali nereye gitti? 

Ali where-FDAT go-FPAST-F3SG 

'Where did Ali go?' 

b. 

Ali okula gitti. 

Ali school-FDAT go-FPAST-F3SG 

'Ali went to school.' 

c. 

* Okula Ali gitti. 

school-FDAT Ali go-FPAST-F3SG 
'It was Ali who went to school.' 

^In the glosses, 3SG denotes third person singu- 
lar verbal agreement, PIPL and P3SG denote first 
person plural and third person singular posses- 
sive agreement, WITH denotes a derivational marker 
making adjectives from nouns, LOG, ABL, DAT, 
GEN denote locative, ablative, dative, and genitive 
case markers, PAST denotes past tense, and INF de- 
notes a marker that derives an infinitive form from 
a verb. 



The word order variations exemplified by (2) 
are very common in Turkisli, especially in dis- 
course. 



Generation of Free Word Order 
Sentences 

The generation process gets as input a feature 
structure representing the content of the sen- 
tence where all the lexical choices have been 
made, then produces as output the surface form 
of the sentence. The feature structures for sen- 
tences are represented using a case-frame rep- 
resentation. Sentential arguments of verbs ad- 
here to the same morphosyntactic constraints 
as the nominal arguments (e.g., the participle 
of, say, a clause that acts as a direct object 
is case-marked accusative, just as the nomi- 
nal one would be). This enables a nice recur- 
sive embedding of case-frames of similar gen- 
eral structure to be used to represent sentential 
arguments. 

In the next sections, we will highlight rel- 
evant aspects of our feature structures for sen- 
tences and their constituents. 



Simple Sentences 

We use the case-frame feature structure in Fig- 
ure 1 to encode the contents of a sentence.^ 
We use the information given in the CONTROL 
feature to guide our grammar in generating 
the appropriate sentential constituent order. 
This information is exploited by a right linear 
grammar (recursively structured nevertheless) 
to generate the proper order of constituents 
at every sentential level (including embedded 
sentential clauses with their own information 
structure). The simplified outline of this right 
linear grammar is given as a finite state ma- 
chine in Figure 2. Here, transitions are labeled 
by constraints and constituents (shown in bold 
face along a transition arc) which are gener- 
ated when those constraints are satisfied. If 
any transition has a NIL label, then no surface 
form is generated for that transition. 

The recursive behavior of this finite state 
machine comes from the fact that the individ- 
ual argument or adjunct constituents can also 
embed sentential clauses. Sentential clauses 



■^Here, c-name denotes a feature structure for 
representing noun phrases or case-frames repre- 
senting embedded sentential forms which can be 
used as nominal or adverbial constituents. 



S-FORM 

CLAUSE-TYPE 

VOICE 

SPEECH-ACT 



QUES 



infinitive/ adverbial/participle /finite 
existential/ attributive/predicative 
active/reflexive/reciprocal/passive/causative 
imperative/opt at ive/necessit at ive/wisli/ 
interrogative/declarative 
FTYPE yes-no/wli 
[const list-of(subject /dir-obj /etc.) 



VERB 



ROOT 

POLARITY 

TENSE 

ASPECT 

MODALITY 



verb 

negative/positive 
present /past /future 
progressive/habitual/etc. 
potentiality 



ARCS 



'SUBJECT 

DIR-OBJ 

SOURCE 

GOAL 

LOCATION 

BENEFICIARY 

INSTRUMENT 

VALUE 



c-name 
c-name 
c-name 
c-name 
c-name 
c-name 
c-name 
c-name 



ADJN 



CONTROL 



TIME c-name 

PLACE c-name 

MANNER c-name 

PATH c-name 
DURATION c-name 

TOPIC constituent 

FOCUS constituent 

BACKGR constituent 



Figure 1: The case-frame for Turkish sentences. 



correspond to either full sentences with non- 
finite or participle verb forms which act as noun 
phrases in either argument or adjunct roles, 
or gapped sentences with participle verb forms 
which function as modifiers of noun phrases 
(the filler of the gap). The former non-gapped 
forms can in Turkish be further classified into 
those representing acts, facts and adverhtals. 
The latter (gapped form) is linked to the filler 
noun phrase by the ROLES feature in the struc- 
ture for noun phrase (which will be presented in 
the following sections): this feature encodes the 
(semantic) role filled by the filler noun phrase 
and the case-frame of the sentential clause. The 
details of the feature structures for sentential 
clauses are very similar to the structure for the 
case-frame. Thus, when an argument or ad- 
junct, which is a sentential clause, is to be re- 
alized, the clause is recursively generated by 
using the same set of transitions. For example, 
the verb 'gor' (see) takes a direct object which 
can be a sentential clause: 



(3) 



Ayse'nin gelisini 
Ay§e+GEN come+INF+P3SG 
'I did not see Ayse's coming.' 

gormedim. 

see+NEG+PAST+lSG 



• sentences linked with a certain relationship. 
Such sentences have the feature structure: 



'TYPE linked 

LINK-RELATION rel 

ARGl complex-sentence 

ARG2 complex-sentence 



Similarly, the subject or any other constituent 
of a sentence can also be a sentential clause: 

(4) 

Ali'nin buraya gelmesi 
Ali+GEN here come+INF+P3SG 
'All's coming here made us 

bizim isi bitirmemizi 

we+GEN the.job finish+INF+PlPL+ACC 

finish the job easier.' 

kolayla§tirdi. 

make_easy+PAST+3SG 



In all these cases, the main sentence gener- 
ator also generates the sentential subjects and 
objects, in addition to generating the main sen- 
tence. 



Complex Sentences 

Complex sentences are combinations of simple 
sentences (or complex sentences themselves) 
which are linked by either conjoining or vari- 
ous relationships like conditional dependence, 
cause-result, etc. The generator works on a 
feature structure representing a complex sen- 
tence which may be in one of the following 
forms: 

• a simple sentence. In this case the sentence 
has the case-frame as its argument feature 
structure. 

["TYPE simple 1 
I ARG case-frame! 

• a series of simple or complex sentences con- 
nected by coordinating or bracketing con- 
junctions. Such sentences have feature struc- 
tures which have the individual case-frames 
as the values of their ELEMENTS features: 



TYPE conj 
CONJ and/or/etc. 
ELEMENTS list-of(complex-sentence) 



Issues in Representing Noun Phrases 

In this section we will briefiy touch on relevant 
aspects of the representation of noun phrases. 
We use the following feature structure (sim- 
plified by leaving out irrelevant details) to de- 
scribe the structure of a noun phrase: 



REF 



CLASS 
ROLES 



MODE 



ARG 

CONTROL 

classifier 
role-type 
■MOD-REL 

ORDINAL 



QUANT-MOD 
QUALY-MOD 

CONTROL 



basic- concept 

[drop -I-/- (default -)] 



list-of(motf. relation) 
["POSITION pos 
[iNTENSIFIER -\-/- 

quantifier 

X\st-oi( simple-property) 
EMPHASIS quant./ 
qual. 



SPEC 



DET 



SET-SPEC 
SPEC-REL 
DEMONS 



QUANTIFIER 
DEFINITE 
REFERENTIAL 
SPECIFIC 



quant. 
+ 1- 
+ 1- 
+ 1- 



list-of( c-name) 
list-of(5pec. relation) 
demonstrative 



POSS 



'ARGUMENT c-name 

[DROP +1- 

[move +1- 



CONTROL 



The order of constituents in noun phrases 
is rather strict at a gross level, i.e., speficiers 
almost always precede modifiers and modifiers 
almost always precede classifiers,'* which pre- 
cede the head noun, although there are numer- 
ous exceptions. Also, within each group, word 
order variation is possible due to a number of 
reasons: 

• The order of quantitative and qualitative 
modifiers may change: the aspect that is em- 
phasized is closer to the head noun. The in- 
definite singular determiner may also follow 



*A classifier in Turkish is a nominal modifier 
which forms a noun-noun noun phrase, essentially 
the equivalent of hook in forms like hook cover in 
English. 



any qualitative modifier and immediately 
precede any classifier and/or head noun. 

• Depending on the determiner used, the po- 
sition of the demonstrative specifier may be 
different. This is a strictly lexical issue and 
not explicitly controlled by the feature struc- 
ture, but by the information (stored in the 
lexicon) about the determiner used. 

• The order of lexical and phrasal modi- 
fiers (e.g., corresponding to a postpositional 
phrase on the surface) may change, if po- 
sitioning the lexical modifier before the 
phrasal one causes unnecessary ambiguity 
(i.e., the lexical modifier in that case can 
also be interpreted as a modifier of some in- 
ternal constituent of the phrasal modifier). 
So, phrasal modifiers always precede lexical 
modifiers and phrasal specifiers precede lex- 
ical specifiers, unless otherwise specified, in 
which case punctuation needs to be used. 

• The possessor may scramble to a position 
past the head or even outside the phrase (to 
a background position), or allow some adver- 
bial adjunct intervene between it and the rest 
of the noun phrase, causing a discontinuous 
constituent. Although we have included con- 
trol information for scrambling the possessor 
to post head position, we have opted not to 
deal with either discontinuous constituents 
or long(er) distance scrambling as these are 
mainly used in spoken discourse. 

Furthermore, since the possessor informa- 
tion is explicitly marked on the head noun, 
if the discourse does not require an overt 
possessor® it may be dropped by suitable set- 
ting of the DROP feature. 

Interfacing with Morphology 

As Turkish has complex agglutinative word 
forms with productive infiectional and deriva- 
tional morphological processes, we handle mor- 
phology outside our system using the gener- 
ation component of a full-scale morphological 

^For example, (c) cannot be used as an answer 
to (a) in the following discourse, where the owner 
of the book should be emphasized: 

a. Kimin kitabi kalm? 
whose book-|-P3SG thick 
'Whose book is thick?' 

b. Benim kitabim kalm. 
I-FGEN book-FPlSG thick 
'My book is thick.' 

c. * Kitabim kalm. 

book-FPlSG thick 



analyzer of Turkish (Ofiazer, 1993). Within 
GenKit, we generate relevant abstract mor- 
phological features such as agreement and pos- 
sessive markers and case marker for nominals 
and voice, polarity, tense, aspect, mood and 
agreement markers for verbal forms. This in- 
formation is properly ordered at the interface 
and sent to the morphological generator, which 
then: 

F performs concrete morpheme selection, dic- 
tated by the morphotactic constraints and 
morphophonological context, 

2. handles morphographemic phenomena such 
as vowel harmony, and vowel and consonant 
ellipsis, and 

3. produces an agglutinative surface form. 

Grammar Architecture and 
Output 

Our generation grammar is written in a formal- 
ism called Pseudo Unification Grammar im- 
plemented by the GenKit generation system 
(Tomita and Nyberg, 1988). Each rule consists 
of a context-free phrase structure description 
and a set of feature constraint equations, which 
are used to express constraints on feature val- 
ues. Non-terminals in the phrase structure part 
of a rule are referenced as xO , . . . , xn in the 
equations, where xO corresponds to the non- 
terminal in the left hand side, and xn is the 
n*'' non-terminal in the right hand side. Since 
the context-free rules are directly compiled into 
tables, the performance of the system is es- 
sentially independent of the number of rules, 
but depends on the complexity of the feature 
constraint equations (which are compiled into 
LISP code). Currently, our grammar has 273 
rules each with very simple constraint checks. 
Of these 273 rules, 133 are for sentences and 
107 are for noun phrases. 

To implement the sentence level genera- 
tor (described by the finite state machine pre- 
sented earlier), we use rules of the form 

S^' — ^ XP 

where the Si and Sj denote some state in the 
finite state machine and the XP denotes the con- 
stituent to be realized while taking this tran- 
sition. If this XP corresponds to a sentential 
clause, the same set of rules are recursively ap- 
plied. This is a variation of the method sug- 
gested by Takeda et al. (1991). 



The following are rule examples that im- 
plement some of the transitions from state to 
state 1: 



(<S> <==> (<S1>) 



( 



((xO control topic) =c *undef ined*) 
(xl = xO) 



)) 



(<S> <==> (<Subject> <S1>) 
( 

((xO control topic) =c subject) 
(x2 = xO) 

((x2 arguments subject) = *remove*) 
(xl = (xO arguments subject)) 
)) 

(<S> <==> (<Time> <S1>) 
( 

((xO control topic) =c time) 
(x2 = xO) 

((x2 adjuncts time) = *remove*) 
(xl = (xO adjuncts time)) 
)) 



The grammar also has rules for realizing a 
constituent like <Subject> or <Time> (which 
may eventually call the same rules if the ar- 
gument is sentential) and rules like above for 
traversing the finite state machine from state 1 



Examples 



In this section, we provide feature structures 
for three example sentences which only differ 
in their information structures. Although the 
following feature structures seem very similar, 
they correspond to different surface forms.® 



(5) 



'S-FORM 

CLAUSE-TYPE 

VOICE 

SPEECH-ACT 



VERB 



ARGUMENTS 



ADJUNCTS 



finite 

predicative 

active 

declarative 
ROOT 
SENSE 
TENSE 



#biral5 ■ 
positive 
past 



ASPECT perfect 



SUBJECT 

DIR-OBJ 

LOCATION 



I Ahmet I 
I tiitap I 
I masa| 



TIME i d 



iin I 



(6) 



Diin kitabi masada Ahmet 
yesterday book-|-ACC table-|-LOC Ahmet 
'It was Ahmet who left the book on 



birakti. 

leave-FPAST-F3SG 
the table yesterday. 



S-FORM 

CLAUSE-TYPE 

VOICE 

SPEECH-ACT 



VERB 



ARGUMENTS 



ADJUNCTS 



CONTROL 



finite 

predicative 

active 

declarative 

^ROOT 
SENSE 
TENSE 
ASPECT 



#birak ' 
positive 
past 
perfect 



I Ahmet I 



SUBJECT I Ahmet I 
DIR-OBJ jkitapj 
LOCATION jmasaj 

TIME {dunjj 

[TOPIC time 1 
[focus subject] 



Ahmet diin kitabi masada 
Ahmet yesterday book-|-ACC table-|-LOC 
'Ahmet left the book on the table 



(7) 



birakti. 

leave-FPAST-F3SG 
yesterday.' 



Diin 



kitabi 



Ahmet 



yesterday book-|-ACC Ahmet 

'It was Ahmet who left the book 



^The feature values in curly brackets indicate 
that, that feature has as value a c-name structure 
for the noun phrase inside the curly brackets. 



birakti masada. 
leave-FPAST-F3SG table-^LOC 
yesterday on the table.' 



S-FORM 

CLAUSE-TYPE 

VOICE 

SPEECH-ACT 



VERB 



ARGUMENTS 



ADJUNCTS 



CONTROL 



finite 

predicative 

active 

declarative 
ROOT 
SENSE 
TENSE 
ASPECT 



#biral5 ■ 
positive 
past 
perfect 



SUBJECT 

DIR-OBJ 

LOCATION 



I Ahmet I 
I tiitap I 
I masa| 



TIME 



d 



iin I 



TOPIC time 
FOCUS subject 
BACKGROUND location 



of a clause dictated by information structure 
constraints, as her formalism allows this in a 
very convenient manner. The word order in- 
formation is lexically kept as multisets associ- 
ated with each verb. She has demonstrated the 
capabilities of her system as a component of 
a prototype database query system. We have 
been influenced by her approach to incorporate 
information structure in generation, but, since 
our aim is to build a wide-coverage generator 
for Turkish for use in a machine translation ap- 
plication, we have opted to use a simpler for- 
malism and a very robust implementation en- 
vironment. 



Conclusions 



Figure 3 shows the path the generator fol- 
lows while generating sentence 7. The solid 
lines show the transitions that the generator 
makes in its right linear backbone. 

Comparison with Related Work 

Dick (1993) has worked on a classification 
based language generator for Turkish. His goal 
was to generate Turkish sentences of varying 
complexity, from input semantic representa- 
tions in Penman's Sentence Planning Language 
(SPL). However, his generator is not complete, 
in that, noun phrase structures in their en- 
tirety, postpositional phrases, word order vari- 
ations, and many morphological phenomena 
are not implemented. Our generator differs 
from his in various aspects: We use a case- 
frame based input representation which we feel 
is more suitable for languages with free con- 
stituent order. Our coverage of the grammar 
is substantially higher than the coverage pre- 
sented in his thesis and we also use a full-scale 
external morphological generator to deal with 
complex morphological phenomena of aggluti- 
native lexical forms of Turkish, which he has 
attempted embedding into the sentence gener- 
ator itself. 

Hoffman, in her thesis (Hoffman, 1995a, 
Hoffman, 1995b), has used the Multiset- 
Combinatory Categorial Grammar formalism 
(Hoffman, 1992), an extension of Combinatory 
Categorial Grammar to handle free word or- 
der languages, to develop a generator for Turk- 
ish. Her generator also uses relevant features of 
the information structure of the input and can 
handle word order variations within embedded 
clauses. She can also deal with scrambling out 



We have presented the highlights of our work 
on tactical generation in Turkish - a free 
constituent order language with agglutinative 
word structures. In addition to the content in- 
formation, our generator takes as input the in- 
formation structure of the sentence (topic, fo- 
cus and background) and uses these to select 
the appropriate word order. Our grammar uses 
a right-linear rule backbone which implements 
a (recursive) finite state machine for dealing 
with alternative word orders. We have also pro- 
vided for constituent order and stylistic varia- 
tions within noun phrases based on certain em- 
phasis and formality features. We plan to use 
this generator in a prototype transfer-based hu- 
man assisted machine translation system from 
English to Turkish. 
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NIL 

Topics *un3efined* 
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